Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features

نویسندگان

Yuan-ming Liou

Yi-Sheng Fu

Hung-yi Lee

Lin-Shan Lee

چکیده

It is very attractive but technically very challenging if the user can retrieve photos from a huge collection using high-level personal queries (e.q. ”Uncle Bill’s House”). This paper proposes a set of approaches to achieve the goal assuming only 30% of the photos are annotated by sparse and spontaneously spoken descriptions when the photos are taken. We fuse the visual features (visual words plus global visual concepts from Columbia 374 detector) with the sparse speech features and train latent topics for the photos in the collection using non-negative matrix factorization. The retrieved results are then enhanced by twolayer mutually reinforced random walk based on different types of features. In this way it becomes possible to retrieve photos without speech annotation or with sparse annotations regarding different categories of information (e.g. where and who) because of the fused visual/speech features and jointly trained latent topics. Very encouraging results were obtained in initial experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors

It is very attractive for the user to retrieve photos from a huge collection using high-level personal queries (e.g. “uncle Bill’s house”), but technically very challenging. Previous works proposed a set of approaches toward the goal assuming only 30% of the photos are annotated by sparse spoken descriptions when the photos are taken. In this paper, to promote the interaction between different ...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Semantic Image Annotation and Retrieval with IKen

In this paper we introduce IKen, a platform for image retrieval enhanced by semantic annotations. The paper discusses work in progress and reports the current state of IKen. This comprises the functionalities for annotating photos with an underlying ontology, search features based on these annotations and the development of the domain ontology used for annotation.

متن کامل

Multimodal and Mobile Personal Image Retrieval: A User Study

Mobile phones have become multimedia devices. Therefore it is not uncommon to observe users capturing photos and videos on their mobile phones. As the amount of digital multimedia content expands, it becomes increasingly difficult to find specific images in the device. In this paper, we present our experience with MAMI, a mobile phone prototype that allows users to annotate and search for digit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features

نویسندگان

چکیده

منابع مشابه

Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Image Classification via Sparse Representation and Subspace Alignment

Semantic Image Annotation and Retrieval with IKen

Multimodal and Mobile Personal Image Retrieval: A User Study

عنوان ژورنال:

اشتراک گذاری